Skip to content

fix(ci): comprehensive overhaul of release-drafter setup#15623

Open
jamesfredley wants to merge 2 commits into
7.0.xfrom
fix/release-drafter-overhaul
Open

fix(ci): comprehensive overhaul of release-drafter setup#15623
jamesfredley wants to merge 2 commits into
7.0.xfrom
fix/release-drafter-overhaul

Conversation

@jamesfredley
Copy link
Copy Markdown
Contributor

@jamesfredley jamesfredley commented May 2, 2026

Summary

Comprehensive overhaul of the Release - Drafter setup that fixes a long-standing set of compounding issues across 7.0.x, 7.1.x, 7.2.x, and 8.0.x. Lands first on 7.0.x and is intended to be merged forward into the higher branches in the usual cascade.

The drafter has been quietly broken for months. Symptoms users have hit:

  • The 8.0.x draft was never created (no draft exists for 8.0.x today).
  • Recent runs queued for 1,400-2,000+ minutes before being cancelled.
  • Release notes occasionally bumped from stale baselines (e.g. v7.0.10 instead of v7.0.11) because the latest releases were excluded from the "last release" lookup.
  • No v7.0.N+1 / v7.1.N+1 / v8.0.0-MN+1 draft was ever maintained during an in-flight ASF vote, because the in-vote release was marked prerelease=true and filtered out of the drafter's "last release" detection. The drafter cannot start the next-version draft until a maintainer manually unmarks the prerelease flag post-vote, which defeats the whole point of an always-fresh draft.
  • continue-on-error: true made all of the above invisible: every run reported "success" even when it produced nothing.

This PR addresses all nine root causes in a single change so the next merge cascade gives every release branch a working drafter at once.

Root causes and fixes

1. Concurrency lock with release.yml (the hour-long delays)

release-notes.yml and release.yml both used release-pipeline-${branch}. release.yml has manual approval gates (environment: release, environment: docs, environment: sdkman) which routinely keep a release run in waiting state for days until a maintainer approves the next stage. Every push to a release branch during that window queued behind the waiting release run.

Evidence from the workflow history:

Run Branch Duration Result
25214035979 7.0.x 1,400 min cancelled
25197284620 7.1.x-stop-4x-exceptionlogging 2,091 min cancelled
25167124818 7.0.x 1,358 min cancelled

Fix: switch to release-drafter-${branch} with cancel-in-progress: true. The drafter and release.yml never touch the same release object - the drafter targets the next-version draft (e.g. v7.0.12), release.yml the currently-published tag (e.g. v7.0.11) - so splitting the groups is safe.

2. Prereleases excluded from "last release" detection (the missing 8.0.x draft AND the broken release cascade)

This is the single biggest fix. It is the reason no draft exists for 8.0.x, AND it is what enables the "draft N+1 while N is being voted on" cascade that we have wanted since the drafter was introduced.

Every Apache Grails release - v7.0.11, v7.1.1, v8.0.0-M1, ... - is published on GitHub with prerelease=true during the ASF vote process. Milestones and RCs (v8.0.0-M1) stay prerelease=true permanently; final releases (v7.0.11) are unmarked after the vote passes. release-drafter's default include-pre-releases=false filters all of them out when finding the "last release" while they are still flagged, which means:

  • 7.0.x bumped from v7.0.10 (last non-prerelease) instead of v7.0.11.

  • 7.1.x bumped from v7.1.0 instead of v7.1.1.

  • 8.0.x had no last release at all (only v8.0.0-M1 exists, excluded as a prerelease) - so the action fell back to walking the entire 265-release commit history and exhausted the GitHub API rate limit (visible verbatim in the workflow logs):

    Found 265 releases
    No draft release found
    No last release found
    Fetching parent commits of 8.0.x...
    ##[error]Request failed due to following response errors:
     - API rate limit already exceeded for site ID installation.
    

Simulation of the new filter pipeline against today's release set:

Branch Current lastRelease After fix
7.0.x v7.0.10 (wrong) v7.0.11
7.1.x v7.1.0 (wrong) v7.1.1
7.2.x NONE (rate limit) NONE (bounded by initial-commits-since)
8.0.x NONE (rate limit) v8.0.0-M1

Fix: include-pre-releases: true in .github/release-drafter.yml.

Net effect: the staged-release cascade now Just Works

This is the behaviour we have wanted from the drafter all along. Concretely, for the next vote on 7.0.x:

State What release-drafter does (after this PR)
v7.0.11 published, no in-flight vote Maintains draft v7.0.12 against v7.0.11.
v7.0.12 staged as prerelease=true, ASF vote running v7.0.12 becomes the "last release"; drafter starts maintaining draft v7.0.13 against it.
v7.0.12 vote passes, prerelease flag dropped Drafter continues maintaining v7.0.13 (same baseline, just unflagged).
v7.0.12 vote fails, prerelease deleted Drafter falls back to v7.0.11 as last release; the draft becomes v7.0.12 again automatically.

No custom "next version calculator" plugin or external workflow is needed - release-drafter plus the existing version-resolver (which already reads type: major / minor / patch PR labels and defaults to patch) handles the next-version selection natively once it can see prereleases.

3. Unbounded git history walk on rate-limit exhaustion

When no last release is found, release-drafter walks parent commit history through GraphQL, paging until the entire history is consumed.

Fix: initial-commits-since: '2026-04-29T00:00:00Z' (just before the most recent releases). This is consulted only when no last release matches the filters - so it does not affect 7.0.x/7.1.x/8.0.x (which now find their last release thanks to fix #2), and it bounds 7.2.x's walk to days instead of years.

4. release-drafter v7.2.0 bug

initial-commits-since was silently ignored when set only in release-drafter.yml (not also as a workflow input) - exactly the path we use. Fixed upstream in release-drafter#1593, shipped in v7.2.1.

Fix: pin to release-drafter@563bf132657a13ded0b01fcb723c5a58cdd824e2 (v7.2.1).

5. Floating action tag on 7.0.x

7.0.x used release-drafter@v7 while 8.0.x was already pinned to a SHA. This violates the ASF security policy enforced in #15523.

Fix: SHA pin on every branch via the merge cascade.

6. continue-on-error swallowed all failures silently

A transient or permanent drafter failure looked identical to a successful run; that is precisely why this broken state went unnoticed for months.

Fix: keep continue-on-error (so transient API blips do not turn every PR check red) but add an explicit verification step that:

  • Logs a structured workflow summary with the draft id, tag, name, and URL on success.
  • Emits a ::warning:: annotation and a workflow-summary warning when the action produced no draft id (the failure mode that made this invisible).

7. Workflow ran on every PR

pull_request: types: [opened, reopened, synchronize, labeled] matched PRs targeting any branch, including feature-to-feature PRs that can never affect a release.

Fix: branches: ['[0-9]+.[0-9]+.x'] on the trigger.

8. issues trigger

Drafter has nothing to do with issue lifecycle; the trigger fired on every issue close/reopen.

Fix: removed.

9. Cross-branch leakage prevention (defense in depth)

Combine three filters so release-drafter cannot match a release from a different release line either when looking for the "last release" to bump from or for an existing draft to update:

  • filter-by-commitish: true (release's target_commitish must equal the branch).
  • filter-by-range: ~MAJOR.MINOR.0 (computed dynamically from the branch name in the workflow step).
  • tag-prefix: v (release's tag must start with v).

Files changed

  • .github/release-drafter.yml
    • Added tag-prefix: v, include-pre-releases: true, initial-commits-since.
    • Existing config (filter-by-commitish: true, autolabeler, categories, version-resolver, template) unchanged.
  • .github/workflows/release-notes.yml
    • Removed issues: trigger.
    • Restricted pull_request trigger to release branches.
    • Switched concurrency group to release-drafter-${branch} with cancel-in-progress: true.
    • Pinned release-drafter to v7.2.1 by commit SHA.
    • Added id: drafter and a verification step that surfaces missing-draft failures via workflow summary and annotations.
    • Heavily commented so future maintainers do not have to re-derive the rationale for any of the above.

How to verify after merge

  1. Watch the next push to 7.0.x trigger the drafter and complete in seconds (no longer queued behind release.yml).
  2. Confirm a draft release v7.0.12 (or whatever the version-resolver picks) appears at https://github.com/apache/grails-core/releases with target_commitish=7.0.x.
  3. After the merge cascade reaches 8.0.x, confirm a v8.0.0-M2 (or v8.0.0) draft appears with target_commitish=8.0.x - this is the missing draft from the original report.
  4. Cascade smoke test on the next release: when v7.0.12 is staged for vote, confirm that within a few PR merges the drafter has rolled the existing v7.0.12 draft forward to a v7.0.13 draft, automatically, with no manual intervention.
  5. If a draft ever fails to appear again, the workflow summary will now contain a clearly visible warning instead of silently passing.

Out of scope

  • The 9 historical v7.0.x releases whose target_commitish is refs/heads/7.0.x instead of 7.0.x are benign: find-previous-releases.ts strips refs/heads/ before comparing, so they still match filter-by-commitish: true. No cleanup needed.
  • The 16 older 3.x/4.x releases with SHA target_commitish cannot match any current branch filter and are correctly excluded.

Assisted-by: claude-code:claude-4.6-opus

Fixes a long-standing set of issues with the Release - Drafter workflow that
collectively prevented per-branch draft releases from working correctly
across 7.0.x, 7.1.x, 7.2.x, and 8.0.x. Symptoms included drafter runs queued
for hours behind the release pipeline, the 8.0.x draft never being created,
and release notes silently bumping from stale baselines.

Root causes addressed:

1. Concurrency lock with release.yml: release-notes.yml shared the
   "release-pipeline-${branch}" concurrency group with release.yml, whose
   manual approval gates routinely keep release runs in the "waiting" state
   for days. Drafter runs queued behind those gates routinely lasted
   1400-2000+ minutes before being cancelled, leaving drafts stale. Fix:
   switch to "release-drafter-${branch}" group with cancel-in-progress so
   the latest push wins. Drafter and release.yml never touch the same
   release object: drafter targets the next-version draft, release.yml the
   currently-published tag, so splitting the groups is safe.

2. Prereleases excluded from "last release" detection: every Apache Grails
   release (v7.0.11, v7.1.1, v8.0.0-M1, ...) is published with
   prerelease=true on GitHub during the ASF vote process. With
   release-drafter's default include-pre-releases=false, those releases
   were filtered out when looking for the "last release", so 7.0.x bumped
   from v7.0.10 instead of v7.0.11 and 8.0.x reported "No last release
   found" (because v8.0.0-M1 was the only release on that branch and was
   excluded), falling back to walking the entire 265-release commit
   history and exhausting the GitHub API rate limit. Fix: set
   include-pre-releases: true.

3. Unbounded git history walk on rate-limit exhaustion: when no last
   release was found, release-drafter walked unbounded parent commit
   history calling the GraphQL API per commit. Fix: set
   initial-commits-since to bound the walk to a recent date floor for
   branches with no prior matching release (e.g. 7.2.x today).

4. Release-drafter v7.2.0 bug: initial-commits-since was silently ignored
   when set only in release-drafter.yml (not also as a workflow input).
   Fix: pin to v7.2.1 (commit 563bf132657a13ded0b01fcb723c5a58cdd824e2)
   which ships the fix from upstream PR #1593.

5. Floating action tag on 7.0.x: 7.0.x used release-drafter@v7 while
   8.0.x was pinned to a SHA. Fix: pin all branches to the v7.2.1 commit
   SHA per ASF security policy (PR #15523).

6. continue-on-error swallowed all failures silently: a transient or
   permanent drafter failure looked identical to a successful run, which
   is why the broken state went unnoticed for so long. Fix: keep
   continue-on-error so PR checks stay green for transient API blips, but
   add an explicit verification step that loudly logs a workflow-summary
   warning and a GitHub Actions annotation when no draft id was produced.

7. Workflow ran on PRs targeting any branch, including feature-to-feature
   PRs that can never affect a release. Fix: restrict the pull_request
   trigger to PRs whose base ref is a release branch.

8. issues trigger ran on every issue close/reopen even though release
   drafting has nothing to do with issue lifecycle. Fix: removed.

9. Cross-branch leakage prevention: combine filter-by-commitish: true,
   filter-by-range: ~MAJOR.MINOR.0 (derived dynamically from the branch
   name) and tag-prefix: v so release-drafter can never match a release
   from a different release line when looking for either the "last
   release" to bump from or an existing draft to update.

The fix is being landed on 7.0.x and will be merged forward into 7.1.x,
7.2.x, and 8.0.x. The 8.0.x branch's existing release-drafter@5de93583
pin (v7.2.0) will be superseded by the new v7.2.1 pin during merge.

Assisted-by: claude-code:claude-4.6-opus
Copilot AI review requested due to automatic review settings May 2, 2026 15:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

concurrency:
group: release-pipeline-${{ github.event.pull_request.base.ref || github.ref_name }}
cancel-in-progress: false
group: release-drafter-${{ github.event.pull_request.base.ref || github.ref_name }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think this is correct. We want the concurrency to be shared with the release, otherwise release drafter will modify a release that is being released. We do not want that to ever happen.

@bito-code-review
Copy link
Copy Markdown

The change separates the concurrency group from 'release-pipeline-${branch}' to 'release-drafter-${branch}' to prevent drafter runs from queuing behind long-waiting release pipeline jobs with manual approvals. The comment explains drafter updates a draft for the next version (e.g., v7.0.12), while release.yml handles the current published tag (e.g., v7.0.11), so they don't conflict. Sharing concurrency isn't needed since they target different releases.

.github/workflows/release-notes.yml

concurrency:
  group: release-drafter-${{ github.event.pull_request.base.ref || github.ref_name }}
  cancel-in-progress: true

@jdaugherty
Copy link
Copy Markdown
Contributor

The change separates the concurrency group from 'release-pipeline-${branch}' to 'release-drafter-${branch}' to prevent drafter runs from queuing behind long-waiting release pipeline jobs with manual approvals. The comment explains drafter updates a draft for the next version (e.g., v7.0.12), while release.yml handles the current published tag (e.g., v7.0.11), so they don't conflict. Sharing concurrency isn't needed since they target different releases.

.github/workflows/release-notes.yml

concurrency:
  group: release-drafter-${{ github.event.pull_request.base.ref || github.ref_name }}
  cancel-in-progress: true

This is incorrect. The creation of the tag itself was enough to historically trigger the release drafter workflow and thus a release being performed would overlap the release drafter run too.

@jamesfredley
Copy link
Copy Markdown
Contributor Author

@jdaugherty Pulled the workflow files and run history to verify. The specific mechanism you described - tag creation triggering the drafter - doesn't match what's in the code or the run logs, but the broader overlap concern is real, just through a different path. Receipts below.

1. The drafter trigger never matched tags

The pre-PR release-notes.yml on 7.0.x had:

on:
  push:
    branches:
      - '[0-9]+.[0-9]+.x'

GitHub Actions' push event with a branches: filter does not fire on tag pushes - tag pushes require an explicit tags: clause, and there isn't one. Pushing refs/tags/v7.0.* alone could not trigger the drafter.

2. The pre-release action pushes only a tag ref, never a branch commit

apache/grails-github-actions/pre-release@asf (the script invoked from release.yml) does:

git checkout "v${VERSION}"                       # detached HEAD at the tag
...
git commit -m "[skip ci] Release v${VERSION}"    # commit lives on detached HEAD
git tag -fa v${VERSION} ...
git push origin "v${VERSION}" --force            # ONLY the tag ref is pushed

The commit never lands on 7.0.x. Only the tag is force-pushed, and [skip ci] in the message would suppress workflows on a branch push anyway.

3. The three runs cited in the PR description prove it

Run Event Trigger
25214035979 (1,400 min) push to 7.0.x Merge of PR #15616 (wrapper-cli-verification)
25197284620 (2,091 min) pull_request fix 4x exception logging
25167124818 (1,358 min) pull_request Where Query documentation improvements

Zero of these are tag-creation events. All three are normal branch / PR activity that happened to land during a release window.

Where the overlap concern is right

Drafter and release.yml did overlap, just not through tags. The actual path:

  1. Normal PRs keep merging to 7.0.x during the 72-hour+ ASF vote.
  2. Each merge fires the drafter on push: branches:.
  3. Meanwhile release.yml is parked in waiting state on its environment: release / docs / sdkman manual-approval gates.
  4. Both workflows shared release-pipeline-${branch}, so the drafter queued behind the waiting release run for days until GitHub's 24-hour run cap killed it.

bito's framing - "they target different releases, so the concurrency group does not need to be shared" - is correct as the rationale for splitting the groups. The bug was not that drafter and release.yml touched the same release object (they never did); it was that the shared concurrency group held the drafter hostage to the manual-approval wait window.

@jamesfredley
Copy link
Copy Markdown
Contributor Author

Follow-up: how this PR also fixes the "draft 7.0.7 while 7.0.6 is being voted on" cascade

This came up offline as a longstanding pain point with the drafter, so worth pinning it explicitly. include-pre-releases: true (one of the three filter changes in this PR) is the entire fix on the drafter side.

Why it was broken

Our ASF release flow stages every release on GitHub with prerelease=true for the duration of the 72-hour vote. The release-drafter default is include-pre-releases: false, so during the vote window the in-flight release is invisible to the action. Consequences:

  • 7.1.x and 7.0.x: drafter would bump from the previous stable (e.g. v7.0.10), try to draft v7.0.11, but v7.0.11 already existed as the in-vote prerelease. Either the draft never moved forward or it collided with the in-vote tag.
  • 8.0.x: v8.0.0-M1 is permanently prerelease=true (milestone convention), so the drafter has never been able to see any release on 8.0.x. That is why no 8.0.x draft exists today.

Net effect: there was no draft for "the next version" while a vote was running, which is exactly the moment a maintainer wants to look at one.

Why it now works

With include-pre-releases: true, release-drafter sees prereleases when picking the "last release". Combined with the existing version-resolver (which defaults to a patch bump and reads type: major / minor / patch PR labels), the cascade is fully automatic:

v7.0.11 published, no in-flight vote          -> drafter maintains draft v7.0.12
v7.0.12 staged (prerelease=true), vote open   -> drafter sees v7.0.12 as last release
                                              -> drafter maintains draft v7.0.13
v7.0.12 vote passes, prerelease flag dropped  -> same baseline, drafter keeps maintaining v7.0.13
v7.0.12 vote fails, prerelease tag deleted    -> drafter falls back to v7.0.11
                                              -> draft becomes v7.0.12 again, automatically

Equivalent flow on 8.0.x:

v8.0.0-M1 (prerelease=true, permanent) -> drafter maintains draft v8.0.0-M2
                                          (or v8.0.0-RC1 / v8.0.0 with the right type: label)

Minor caveat on milestone branches: semver.inc('8.0.0-M1', 'patch') is 8.0.0 (strip prerelease, no version bump), so M1 -> M2 transitions need a PR labeled type: minor or the draft tag manually renamed. Once a non-milestone 8.0.x release is cut, the cascade behaves the same as on 7.0.x.

Where the existing apache/grails-github-actions next-version logic fits

For completeness - there is a "calculate the next version" script in the Apache org: apache/grails-github-actions/post-release/increment_version.sh (-M major, -m minor, -p patch, with M/RC handling). It is invoked from release.yml's close job after a successful vote, and its job is to write ${NEXT}-SNAPSHOT into gradle.properties for the next dev cycle. That is a different concern from "what version should the next GitHub Release draft be tagged as", which is what release-drafter handles via version-resolver.

Both default to a patch bump, so they agree by default. If we ever want them to disagree (e.g. cut a minor instead of a patch), we already have two coordinated mechanisms:

  • Snapshot bump in gradle.properties: override the call to increment_version.sh (or pass an explicit RELEASE_VERSION to post-release).
  • Draft tag in release-drafter: label one of the merged PRs with type: minor (or type: major) so version-resolver picks the matching bump.

There is no need to wire increment_version.sh into the drafter - release-drafter already does the same computation, sourced from PR labels instead of a hard-coded flag, which is actually closer to "intelligent enough to figure out the next version" than a fixed -p invocation. The dilemma described offline is resolved by fix #2 in the PR description; no additional action or plugin is required.

@jamesfredley jamesfredley requested review from matrei and sbglasius May 22, 2026 17:16
@testlens-app
Copy link
Copy Markdown

testlens-app Bot commented May 22, 2026

✅ All tests passed ✅

🏷️ Commit: 182a751
▶️ Tests: 4187 executed
⚪️ Checks: 33/33 completed


Learn more about TestLens at testlens.app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

project folder in grails home should use application name, not parent folder name

3 participants